Introduction


When presenting the data overview and exploratory analysis results, we used to copy a lots tables, charts from Rstudio to PowerPoint, which makes the presentation preparation painful. It become essential for data scientist to make use of better reporting tools, such as R markdown, jupyter notebooks to author analysis presentation in a more efficient and organized way, of course, we also want this to be reproducible!

In this post, I would like to share some tips when I explore building analysis report using R markdown/notebook.

R markdown


Yihui Xie provided a very comprehensive and updated version of R markdown cookbook at https://bookdown.org/yihui/rmarkdown/, which gives good explanation how to make table of content, fold code snippets, configure the yaml header and more.

Tabs


One thing I found specially useful is the tabbed section. Tab layout helps to condense the parallel and lengthy content in the report.

Simply put {.tabset} tag after the markdown header and the sub-headers will become the tabs. The following code snippet gives an example

# Tabs {.tabset}

## Header2 - Tab1
this is tab1

## Header2 - Tab2
this is tab2

Header2 - Tab1

this is tab1

Header2 - Tab2

this is tab2


Tables


The native markdown table isn’t very user-friendly, so we have to make use of functions such as knitr::kable or DT::datatable to render the table from data.frame.

I would like to share some tips on choosing between kable and datatable.

  • kable has simpler syntax and give more appealing “table like” tables in most themes.
  • datatable has more capability such as paged tables with download buttons. There are more configurations could be referred from its JavaScript API specifications.

In a nutshell, kable is preferable for smaller tables, while datatable is preferable for bigger tables.

markdown table

some random markdown table

| Tables   |      Are      |  Cool |
|----------|:-------------:|------:|
| col 1 is |  left-aligned | $1600 |
| col 2 is |    centered   |   $12 |
| col 3 is | right-aligned |    $1 |
Tables Are Cool
col 1 is left-aligned $1600
col 2 is centered $12
col 3 is right-aligned $1

kable

kable is from knitr package

require(knitr)
require(kableExtra)
mtcars %>%
  head() %>%
  kable(digits = 1, caption = 'example of kable table') %>%
  kable_styling(full_width = FALSE, position = 'left') %>%
  row_spec(0,
           bold = T,
           color = 'white',
           background = 'black')
example of kable table
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.9 2.6 16.5 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.9 2.9 17.0 0 1 4 4
Datsun 710 22.8 4 108 93 3.9 2.3 18.6 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.1 3.2 19.4 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.1 3.4 17.0 0 0 3 2
Valiant 18.1 6 225 105 2.8 3.5 20.2 1 0 3 1

datatable

datatable is from DT package

JS - DataTables

options list: https://datatables.net/reference/option/

R - DT package

https://rstudio.github.io/DT/

mtcars %>%
  datatable(
    class = 'cell-border stripe',
    extensions = 'Buttons',
    options = list(
      dom = 'Bfrtip',
      buttons = c('copy', 'csv', 'print')
    )
  )

Data Summary


summartools

require(summarytools)
mtcars %>%
  dfSummary(style = 'grid',
            graph.magnif = 0.75,
            plain.ascii = F,
            valid.col = FALSE,
            tmp.img.dir = "/tmp") %>%
  print()

Data Frame Summary

mtcars

Dimensions: 32 x 11
Duplicates: 0

No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 mpg
[numeric]
Mean (sd) : 20.1 (6)
min < med < max:
10.4 < 19.2 < 33.9
IQR (CV) : 7.4 (0.3)
25 distinct values 0
(0%)
2 cyl
[numeric]
Mean (sd) : 6.2 (1.8)
min < med < max:
4 < 6 < 8
IQR (CV) : 4 (0.3)
4 : 11 (34.4%)
6 : 7 (21.9%)
8 : 14 (43.8%)
0
(0%)
3 disp
[numeric]
Mean (sd) : 230.7 (123.9)
min < med < max:
71.1 < 196.3 < 472
IQR (CV) : 205.2 (0.5)
27 distinct values 0
(0%)
4 hp
[numeric]
Mean (sd) : 146.7 (68.6)
min < med < max:
52 < 123 < 335
IQR (CV) : 83.5 (0.5)
22 distinct values 0
(0%)
5 drat
[numeric]
Mean (sd) : 3.6 (0.5)
min < med < max:
2.8 < 3.7 < 4.9
IQR (CV) : 0.8 (0.1)
22 distinct values 0
(0%)
6 wt
[numeric]
Mean (sd) : 3.2 (1)
min < med < max:
1.5 < 3.3 < 5.4
IQR (CV) : 1 (0.3)
29 distinct values 0
(0%)
7 qsec
[numeric]
Mean (sd) : 17.8 (1.8)
min < med < max:
14.5 < 17.7 < 22.9
IQR (CV) : 2 (0.1)
30 distinct values 0
(0%)
8 vs
[numeric]
Min : 0
Mean : 0.4
Max : 1
0 : 18 (56.2%)
1 : 14 (43.8%)
0
(0%)
9 am
[numeric]
Min : 0
Mean : 0.4
Max : 1
0 : 19 (59.4%)
1 : 13 (40.6%)
0
(0%)
10 gear
[numeric]
Mean (sd) : 3.7 (0.7)
min < med < max:
3 < 4 < 5
IQR (CV) : 1 (0.2)
3 : 15 (46.9%)
4 : 12 (37.5%)
5 : 5 (15.6%)
0
(0%)
11 carb
[numeric]
Mean (sd) : 2.8 (1.6)
min < med < max:
1 < 2 < 8
IQR (CV) : 2 (0.6)
1 : 7 (21.9%)
2 : 10 (31.2%)
3 : 3 ( 9.4%)
4 : 10 (31.2%)
6 : 1 ( 3.1%)
8 : 1 ( 3.1%)
0
(0%)

Static Plots

ggplot2 is our best friend in R visualization and it has good support in R markdown. Chaining functions using %>% and + makes the code chunk beautiful!.

A lot of times, we would like combined many sub-plots into one. ggplot2::facet_grid could do some of jobs, but I found ggpubr::ggarrange is more powerful that allow you to combined any plots and even tables. It’s cool to put chart and table side by side. (example is given in subsequent section)

ggrigdes is another useful ggplot extension that plots multiple density plots in a single chart. This is often used when comparing profiles between groups. check the detail from here: https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html


ggplot

require(ggplot2)
cor(mtcars) %>%
  as.data.frame() %>%
  tibble::rownames_to_column('var1')  %>%
  tidyr::pivot_longer(-var1, names_to = 'var2', values_to = 'cor') %>%
  filter(var1 <= var2) %>%
  ggplot(aes(x = var1, y = var2, fill = cor, label = round(cor,2))) +
  geom_tile() +
  geom_text() +
  scale_fill_gradient2() +
  labs(title = 'example of ggplot2 in R markdown')

ggpubr

combine mulitple charts or tables

require(ggpubr)
require(forcats)
# add .groups = 'drop' to remove some warnings from `dplyr`
data <- mtcars %>%
  group_by(gear) %>%
  summarise(n = n(), .groups = 'drop') %>%
  ungroup() %>%
  mutate(gear = fct_rev(factor(gear)))

plt <- data %>%
  ggplot(aes(x  = gear, y = n)) +
  geom_bar(stat = 'identity', fill = 'lightblue') +
  coord_flip() +
  labs(title = 'example of combined table and plot using ggpubr')

tbl <- ggtexttable(data, rows = NULL)
ggarrange(plt, tbl, ncol =2 , nrow = 1, widths = c(2,1))

Interactive Plots


This is the section that becomes tricky. Interaction plots are only supported in HTML R document and there is no dominating interactive visualization packages in R environment.

  • plotly provides comprehensive chart types, documentation and cross-language capability. However I personally don’t like the style, syntax and toolbox at the right upper corner.

  • echarts4r is a R interface for Echarts JavaScript library, which was open sourced by Baidu. I have tested the latest version 0.3.2 and it works well with R markdown.

  • googleVis is a R interface for Google Charts JavaScript library, which was of course developed by Google. The package has a good collection of different chart types, but it has some unknown incompatibility with both R markdown and Shiny. I’ve found a workaround to integrate googleVis charts in R markdown, but it’s not perfect.

plotly

Plotly R document site: https://plotly.com/r/

require(plotly)
p <- mtcars %>%
  mutate(am = factor(am)) %>%
  ggplot(aes(x = mpg, y = wt, color = am)) +
  geom_point() +
  labs(title = 'example of plotly in R markdown')
ggplotly(p)

Echarts

library(echarts4r)
df <- data.frame(
  x = LETTERS[1:5],
  y = runif(5, 1, 5),
  z = runif(5, 3, 7)
)

df %>% 
  e_charts(x) %>% 
  e_radar(y, max = 7, name = "radar") %>%
  e_radar(z, max = 7, name = "chart") %>%
  e_tooltip(trigger = "item")

Google Charts

self_contained: false is required for googleVis charts render in R markdown, refer to the github issue here.

output: 
  html_document:
    self_contained: false
# self_contained: false is required for googleVis charts render in R markdown
suppressPackageStartupMessages(library(googleVis))
op <- options(gvis.plot.tag="chart")
plot(gvisHistogram(dino))